Overview

Dataset statistics

Number of variables5
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory32.4 KiB
Average record size in memory33.1 B

Variable types

Numeric3
Boolean1
Categorical1

Alerts

df_index has unique values Unique
interestedPartyStatementID has unique values Unique
subjectStatementID has unique values Unique

Reproduction

Analysis started2022-06-01 21:24:14.221596
Analysis finished2022-06-01 21:25:05.660699
Duration51.44 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2805.055
Minimum9
Maximum5558
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-06-01T22:25:05.713094image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum9
5-th percentile263.85
Q11420.5
median2768
Q34217.5
95-th percentile5361.2
Maximum5558
Range5549
Interquartile range (IQR)2797

Descriptive statistics

Standard deviation1633.179139
Coefficient of variation (CV)0.582227136
Kurtosis-1.20629847
Mean2805.055
Median Absolute Deviation (MAD)1412.5
Skewness-0.01646770117
Sum2805055
Variance2667274.1
MonotonicityNot monotonic
2022-06-01T22:25:05.787757image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36011
 
0.1%
19461
 
0.1%
37081
 
0.1%
11331
 
0.1%
40631
 
0.1%
54791
 
0.1%
54291
 
0.1%
41251
 
0.1%
34501
 
0.1%
29281
 
0.1%
Other values (990)990
99.0%
ValueCountFrequency (%)
91
0.1%
131
0.1%
191
0.1%
281
0.1%
301
0.1%
321
0.1%
341
0.1%
371
0.1%
391
0.1%
511
0.1%
ValueCountFrequency (%)
55581
0.1%
55571
0.1%
55541
0.1%
55481
0.1%
55391
0.1%
55381
0.1%
55361
0.1%
55321
0.1%
55291
0.1%
55251
0.1%

interestedPartyStatementID
Real number (ℝ≥0)

UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.064471042 × 1018
Minimum9.96661279 × 1015
Maximum1.841131821 × 1019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-06-01T22:25:05.869709image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum9.96661279 × 1015
5-th percentile9.643683988 × 1017
Q14.312639068 × 1018
median8.82941965 × 1018
Q31.374430733 × 1019
95-th percentile1.760085101 × 1019
Maximum1.841131821 × 1019
Range1.84013516 × 1019
Interquartile range (IQR)9.431668264 × 1018

Descriptive statistics

Standard deviation5.28530985 × 1018
Coefficient of variation (CV)0.5830797876
Kurtosis-1.19127615
Mean9.064471042 × 1018
Median Absolute Deviation (MAD)4.736985735 × 1018
Skewness0.03800593563
Sum9.064471042 × 1021
Variance2.793450021 × 1037
MonotonicityNot monotonic
2022-06-01T22:25:05.946634image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.039954664 × 10181
 
0.1%
1.690565218 × 10191
 
0.1%
8.428473491 × 10181
 
0.1%
7.51977738 × 10181
 
0.1%
9.050788094 × 10181
 
0.1%
1.541499861 × 10181
 
0.1%
1.322564417 × 10191
 
0.1%
2.935879797 × 10181
 
0.1%
7.860754785 × 10171
 
0.1%
1.320258748 × 10191
 
0.1%
Other values (990)990
99.0%
ValueCountFrequency (%)
9.96661279 × 10151
0.1%
5.059297737 × 10161
0.1%
5.709866048 × 10161
0.1%
1.003764656 × 10171
0.1%
1.106155461 × 10171
0.1%
1.186848996 × 10171
0.1%
1.47899958 × 10171
0.1%
2.05819899 × 10171
0.1%
2.407618482 × 10171
0.1%
2.532792532 × 10171
0.1%
ValueCountFrequency (%)
1.841131821 × 10191
0.1%
1.840987267 × 10191
0.1%
1.834842509 × 10191
0.1%
1.834243925 × 10191
0.1%
1.833862702 × 10191
0.1%
1.833233604 × 10191
0.1%
1.832391368 × 10191
0.1%
1.831339655 × 10191
0.1%
1.830624871 × 10191
0.1%
1.83037944 × 10191
0.1%
Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
True
912 
False
 
88
ValueCountFrequency (%)
True912
91.2%
False88
 
8.8%
2022-06-01T22:25:06.018523image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

subjectStatementID
Real number (ℝ≥0)

UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.197044349 × 1018
Minimum1.048368765 × 1016
Maximum1.844161125 × 1019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-06-01T22:25:06.075030image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1.048368765 × 1016
5-th percentile8.59627042 × 1017
Q14.327400042 × 1018
median9.177306067 × 1018
Q31.370790338 × 1019
95-th percentile1.758487481 × 1019
Maximum1.844161125 × 1019
Range1.843112756 × 1019
Interquartile range (IQR)9.380503337 × 1018

Descriptive statistics

Standard deviation5.387595119 × 1018
Coefficient of variation (CV)0.5857963618
Kurtosis-1.240904045
Mean9.197044349 × 1018
Median Absolute Deviation (MAD)4.681183506 × 1018
Skewness0.009705259091
Sum9.197044349 × 1021
Variance2.902618117 × 1037
MonotonicityNot monotonic
2022-06-01T22:25:06.151683image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.352645712 × 10181
 
0.1%
4.302773373 × 10181
 
0.1%
1.530775994 × 10191
 
0.1%
2.115959212 × 10181
 
0.1%
4.941679117 × 10181
 
0.1%
1.297618993 × 10191
 
0.1%
6.049492525 × 10181
 
0.1%
1.145302343 × 10191
 
0.1%
1.570640351 × 10191
 
0.1%
1.140430196 × 10191
 
0.1%
Other values (990)990
99.0%
ValueCountFrequency (%)
1.048368765 × 10161
0.1%
3.59029514 × 10161
0.1%
4.31121121 × 10161
0.1%
5.099757031 × 10161
0.1%
7.032438087 × 10161
0.1%
7.142126181 × 10161
0.1%
1.011228621 × 10171
0.1%
1.395670208 × 10171
0.1%
1.518917238 × 10171
0.1%
1.742856748 × 10171
0.1%
ValueCountFrequency (%)
1.844161125 × 10191
0.1%
1.841474472 × 10191
0.1%
1.839014401 × 10191
0.1%
1.838610119 × 10191
0.1%
1.836635891 × 10191
0.1%
1.836011364 × 10191
0.1%
1.835768645 × 10191
0.1%
1.835643299 × 10191
0.1%
1.833870042 × 10191
0.1%
1.832686034 × 10191
0.1%

minimumShare
Categorical

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
75.0
567 
25.0
389 
50.0
 
44

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters4000
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row25.0
2nd row50.0
3rd row25.0
4th row25.0
5th row75.0

Common Values

ValueCountFrequency (%)
75.0567
56.7%
25.0389
38.9%
50.044
 
4.4%

Length

2022-06-01T22:25:06.221200image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-01T22:25:06.275623image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
75.0567
56.7%
25.0389
38.9%
50.044
 
4.4%

Most occurring characters

ValueCountFrequency (%)
01044
26.1%
51000
25.0%
.1000
25.0%
7567
14.2%
2389
 
9.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3000
75.0%
Other Punctuation1000
 
25.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01044
34.8%
51000
33.3%
7567
18.9%
2389
 
13.0%
Other Punctuation
ValueCountFrequency (%)
.1000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01044
26.1%
51000
25.0%
.1000
25.0%
7567
14.2%
2389
 
9.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII4000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01044
26.1%
51000
25.0%
.1000
25.0%
7567
14.2%
2389
 
9.7%

Interactions

2022-06-01T22:24:45.273149image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:24:14.330303image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:24:25.729673image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:24:48.012245image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:24:14.408911image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:24:28.262225image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:24:56.956471image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:24:20.049100image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:24:36.845190image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-06-01T22:25:06.317182image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-01T22:25:06.378892image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-01T22:25:06.443693image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-06-01T22:25:06.505621image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-06-01T22:25:06.563296image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-01T22:25:05.555183image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-01T22:25:05.631284image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexinterestedPartyStatementIDinterestedPartyIsPersonsubjectStatementIDminimumShare
036012039954664013561264True235264571155043659825.0
126169027927498965800066True1361607085255572815150.0
22245479055635223948296True665824895977450643125.0
3469310685402616317734906True1086029522075708647525.0
457614566038486263607917True1647363164340703047775.0
516191179696533967079481True1674904616672714757350.0
61524830461716565545927True1438874476121278169675.0
750771481259068514272500True391947050303760663025.0
836324923272178238455813True213345499484749268825.0
9523010894855595172079268True1220438238987368969025.0

Last rows

df_indexinterestedPartyStatementIDinterestedPartyIsPersonsubjectStatementIDminimumShare
99011838657544021888051606False1822016252866711235875.0
991409814773808248550862599True222602718240080194325.0
99249625745977813030369120True259405858058690669425.0
9935520713813154471278157True1704348188905208810575.0
994154513508469925370910641True742061823291594445975.0
99545359120463668042900967True1030066272722512941725.0
99621713263478187579880481True564275881318664722275.0
99738359950833446970954170True1227077690137271435850.0
9985414140672557023582339True1774091427880828286525.0
99943894697176215894684290True1731174586289594450675.0